This notebook provides the code for the Cobb-Douglas Regression model.
At the top of each notebook all neccessary libraries will be imported, before starting with the actual coding. \ Afterwards, functions are defined that will later be called to simulate the data, print out metrics and to create visualisations. \ The functions are the same for each model within a notebook file.
#importing all necessary libraries
import pandas as pd
import numpy as np
import random
import ipywidgets as widgets
from IPython.display import Javascript, display
def DataFunction(alpha, rho, intercept, n, min=0, max=10):
"""
This function generates a dataset with three different variables.
The variables ln(L) and ln(K) are randomly drawn from a uniform distribution and lie in a range between 0 and 10 by default.
A seed is set to 0 to enable reproducibility.
The varibale ln(Y) is then computed with the Translog function, using the randomly generated values and the parameters alpha, rho, and the intercept.
Afterwards, a dictionary with all values is created.
Applying the function returns a pandas.DataFrame object with n samples.
"""
#setting the seed
np.random.seed(0)
#draw random values
l_rand = np.random.uniform(min, max, n)
k_rand = np.random.uniform(min, max, n)
#computing the values for ln(Y) with the Translog production function
y_TL = intercept + alpha*l_rand + (1-alpha)*k_rand - 1/2*rho*alpha*(1-alpha)*((k_rand-l_rand)**2)
#create a dictionary with all variables
TL_dict = {'ln(Y)': y_TL, 'ln(L)': l_rand, 'ln(K)': k_rand}
return(pd.DataFrame(TL_dict))
def error_term(sigma, n, mu=0):
"""
This function randomly draws n values from a normal distribution.
When the function is called, the standard deviation and the number of values have to be defined.
The mean of the distribution is 0 by default.
Values are returned in form of a numpy.array.
"""
np.random.seed(0)
u = np.array(np.random.normal(mu, sigma, n))
return(u)
from sklearn import metrics
def summary(test_values, predicted_values):
"""
This function computes the root mean sqared error (RMSE) and the mean absolute error (MAE).
It uses the vaules form a test set and the fitted values to return a pandas.DataFrame object.
"""
#computing the RMSE and the MAE with the respective functions from the sklearn library
RMSE = (metrics.mean_squared_error(y_test, y_pred))**(0.5)
MAE = metrics.mean_absolute_error(y_test, y_pred)
#create a dictionary withe the metrics
summary_dict = {'Metric': ['RMSE','MAE'],
'Value': [RMSE, MAE]}
return(pd.DataFrame(summary_dict))
import plotly.express as px
import plotly.graph_objects as go
def Plot_function(model, data):
"""
This function visualises a regression model for a data set in a 3 dimensional space.
The plotly package is used and enables interaction with the plot.
"""
#defining the size of the mesh grid and the margins
mesh_size = 0.09
margin = 0
#fitting the model to the exogenous and endogenous variables of the whole dataset.
model.fit(X, y)
#create a mesh grid to later run the model on
x_min, x_max = X.min() - margin, X.max() + margin
y_min, y_max = X.min() - margin, X.max() + margin
xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)
xx, yy = np.meshgrid(xrange, yrange)
#run model
pred = model.predict(np.c_[xx.ravel(), yy.ravel()])
pred = pred.reshape(xx.shape)
#generate the plot
fig = px.scatter_3d(data, x='ln(L)', y='ln(K)', z='ln(Y)')
fig.update_traces(marker=dict(size=2))
fig.add_traces(go.Surface(x=xrange, y=yrange, z=pred, name='pred_surface'))
fig.show()
#To prepare the grid of the heatmap, a list is created to represent the pixels on the grid.
#This double loop creates a list of value combinations between 0 and 10
liste = []
i = 0
while i <= 10:
j = 0
while j <= 10:
liste.append([i,j])
j += 0.1
i += 0.1
#The previously created list is then transformed into a numpy.array
data = np.asarray(liste)
#A pandas.DataFrame object is then created the generated grid values are assigned to the Variables ln(L) and ln(K)
columns = ['ln(L)', 'ln(K)']
df_heatmap = pd.DataFrame(data = data, columns = columns)
import plotly.express as px
def Heatmap_function(model):
'''
Returns a heatmap of the given regression model for values between 0 and 10.
'''
#defining the grid values
X_heatmap = df_heatmap[['ln(L)','ln(K)']].values
#compute predictions of the model
y_heatmap = model.predict(X_heatmap)
#create a pandas.DataFrame with the grid values and the fitted values
df_heatmap['predicted ln(Y)'] = y_heatmap
heatmap = df_heatmap.pivot('ln(L)','ln(K)','predicted ln(Y)')
#displaying the heatmap
fig = px.imshow(heatmap,labels=dict(color="predicted value"))
fig.update_yaxes(autorange=True)
fig.show()
This is where the actual regression model begins.\ The user has to define values for the given parameters by adjusting the values of the sliders and the input cell. \ Do not execute the cell, because it will reset the parameters to their defaults. Afterwards, press the "Run all cells below" button to execute the code below with the desired set of parameters.
As mentioned in the written part of the thesis, the scikit-learn library by Pedregosa et al.(2011) is used to code the different regression models. \ The documentation of the library can be accessed with the following link: https://scikit-learn.org/stable/ \ Functions that are taken from this package will be explained, when they occur.
intercept_slider = widgets.FloatSlider(value=0.1, min=0.1, max=1, step=0.1, description='Intercept')
alpha_slider = widgets.FloatSlider(value=0.5, min=0.5, max=1, step=0.1, description='α')
rho_slider = widgets.FloatSlider(value=0, min= 0, max= 1 , step=0.1, description='ρ')
sigma_slider = widgets.FloatSlider(value=1, min=0.5, max=1.5, step=0.25, description='σ')
n_input = widgets.IntText(value = 125, description = 'Samples')
display(intercept_slider,alpha_slider,rho_slider,sigma_slider, n_input)
def run_all(ev):
display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1, IPython.notebook.ncells())'))
button = widgets.Button(description="Run all cells below")
button.on_click(run_all)
display(button)
#calling the data function on the parameter values and assign it to the variable data
data = DataFunction(alpha = alpha_slider.value,
intercept = intercept_slider.value,
n = n_input.value,
rho = rho_slider.value)
#getting the first 5 rows of the DataFrame
data.head()
#adding the error term to the values of the endogenous variable
data['ln(Y)'] += error_term(sigma = sigma_slider.value,
n = n_input.value)
#getting the first 5 rows of the new DataFrame
data.head()
#assigning the exogenous variables to X and the endogenous variable to y
X = data[['ln(L)','ln(K)']].values
y = data[['ln(Y)']].values
The train_test_split function splits vectors and matrices into test and training set.\
In this case the matrix X and the vector y are split.\
The test_size is set to 25% of the whole data.\
random_state = 0 sets a random seed to make the splitted sets reproducible, since the data is shuffled before splitting (Pedregosa et al. ,2011).
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
Next, the regression model is initiated using the LinearRegression class.\
The LinearRegression class initiates a linear regression model, using the ordinary least squares method.
Afterwards, the model is fit to the training data. (Pedregosa et al. , 2011)
from sklearn.linear_model import LinearRegression
LinearReg = LinearRegression()
LinearReg.fit(X_train, y_train)
#accessing the estimated coefficients of the model
intercept = LinearReg.intercept_[0]
coefficients = LinearReg.coef_[0,:]
print('Intercept:',round(intercept,4))
print('Coefficients:',round(coefficients[0],4),',',round(coefficients[1],4))
print('Sum of coefficients:',round(coefficients[0],4)+round(coefficients[1],4))
#use the fitted model to predict values of the endogeneous variables for the test set
y_pred = LinearReg.predict(X_test)
#print RMSE and MAE, using the summary function
summary(test_values=y_test, predicted_values=y_pred)
Plot_function(model=LinearReg, data=data)
Heatmap_function(model=LinearReg)